NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ConfliBERT-Spanish: A Pre-trained Spanish Language Model for Political Conflict and Violence

https://doi.org/10.1109/CiSt56084.2023.10409883

Yang, Wooseong; Alsarra, Sultan; Abdeljaber, Luay; Zawad, Niamat; Delaram, Zeinab; Osorio, Javier; Khan, Latifur; Brandt, Patrick T; D’Orazio, Vito (December 2023, Vakka)

This article introduces ConfliBERT-Spanish, a pre-trained language model specialized in political conflict and violence for text written in the Spanish language. Our methodology relies on a large corpus specialized in politics and violence to extend the capacity of pre-trained models capable of processing text in Spanish. We assess the performance of ConfliBERT-Spanish in comparison to Multilingual BERT and BETO baselines for binary classification, multi-label classification, and named entity recognition. Results show that ConfliBERT-Spanish consistently outperforms baseline models across all tasks. These results show that our domain-specific language-specific cyberinfrastructure can greatly enhance the performance of NLP models for Latin American conflict analysis. This methodological advancement opens vast opportunities to help researchers and practitioners in the security sector to effectively analyze large amounts of information with high degrees of accuracy, thus better equipping them to meet the dynamic and complex security challenges affecting the region.
more » « less
Full Text Available
ConfliBERT-Arabic: A Pre-trained Arabic Language Model for Politics, Conflicts and Violence

Alsarra, Sultan; Abdeljaber, Luay; Yang, Wooseong; Zawad, Niamat; Khan, Latifur; Brandt, Patrick; Osorio, Javier; D'Orazio, Vito (July 2023, Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing https://aclanthology.org/2023.ranlp-1.11)
Mitkov, Ruslan; Angelova, Galia (Ed.)
This study investigates the use of Natural Language Processing (NLP) methods to analyze politics, conflicts and violence in the Middle East using domain-specific pre-trained language models. We introduce Arabic text and present ConfliBERT-Arabic, a pre-trained language models that can efficiently analyze political, conflict and violence-related texts. Our technique hones a pre-trained model using a corpus of Arabic texts about regional politics and conflicts. Performance of our models is compared to baseline BERT models. Our findings show that the performance of NLP models for Middle Eastern politics and conflict analysis are enhanced by the use of domain-specific pre-trained local language models. This study offers political and conflict analysts, including policymakers, scholars, and practitioners new approaches and tools for deciphering the intricate dynamics of local politics and conflicts directly in Arabic.
more » « less
Full Text Available
ConfliBERT-Arabic: A Pre-trained Arabic Language Model for Politics, Conflicts and Violence

https://doi.org/10.26615/978-954-452-092-2_011

Alsarra, Sultan; Abdeljaber, Luay; Yang, Wooseong; Zawad, Niamat; Khan, Latifur; T_Brandt, Patrick; Osorio, Javier; J_D’Orazio, Vito (January 2023, INCOMA Ltd., Shoumen, BULGARIA)

Full Text Available

Search for: All records